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ABSTRACT 

This paper proposes a elass of estimators for population correlation coefficient 
when information about the population mean and population variance of one of the 
variables is not avaliable but information about these parameters of another variable 
(auxiliary) is avaliable, in two phase sampling and analyzes its properties. Optimum 
estimator in the class is identified with its variance formula. The estimators of the class 
involve unknown constants whose optimum values depend on unknown population 
parameters. Following Singh (1982) and Srivastava and Jhajj (1983), it has been shown 
that when these population parameters are replaced by their consistent estimates the 
resulting class of estimators has the same asymptotic variance as that of optimum 
estimator. An empirical study is carried out to demonstrate the performance of the 
constructed estimators. 
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1. Introduction 

Consider a finite population U= {l,2,..,i,..N}. Let y and x be the study and auxiliary 
variables taking values yi and Xi respectively for the ith unit. The correlation coefficient 
between y and x is defined by 

=Syx/(SySx) (1.1) 

where 

-{N- 1)^‘ f (y,- - fix, - x) , = (TV _ 1)-' 2 _ xf , = (TV - l)-> f - ff , 

i=l i=l i=l 

_ ^ ^ 

i=l i=l 

Based on a simple random sample of size n drawn without replacement, 




(xi , yi), i = l,2,...,n; the usual estimator of Pj,;,is the eorresponding sample eorrelation 
eoeffieient : 

r=Syx/(SxSy) (1.2) 

where = (n - 1)^‘ J (f / “ f = (« - 1)^‘ Z 

/=! i=l 

= (« - l)^' Z (f / - f f Zf / , ^ Z^/ • 

1=1 1=1 1=1 

The problem of estimating has been earlier taken up by various authors ineluding 

Koop (1970), Gupta et. al. (1978, 79), Wakimoto (1971), Gupta and Singh (1989), Rana 
(1989) and Singh et. al. (1996) in different situations. Srivastava and Jhajj (1986) have 
further eonsidered the problem of estimating p^^ in the situations where the information 

on auxiliary variable x for all units in the population is available. In sueh situations, they 
have suggested a elass of estimators for p^^ whieh utilizes the known values of the 

population mean X and the population varianee of the auxiliary variable x. 

In this paper, using two - phase sampling meehanism, a class of estimators for 
in the presence of the available knowledge (Z and S^) on second auxiliary variable z 

is considered, when the population mean X and population variance 5*^ of the main 
auxiliary variable x are not known. 

2. The Suggested Class of Estimators 

In many situations of practical importance, it may happen that no information is 
available on the population mean X and population variance , we seek to estimate the 
population correlation coefficient from a sample ‘s’ obtained through a two-phase 

selection. Allowing simple random sampling without replacement scheme in each phase, 
the two- phase sampling scheme will be as follows: 

(i) The first phase sample 5* (s* cc) of fixed size , is drawn to observe only x in 
order to furnish a good estimates of X and . 

(ii) Given 5*, the second- phase sample s (scs*) of fixed size n is drawn to 
observe y only. 

Let 

^ = (V«)Za ,T = (1/«)Za =(V«i)Za Ax =(«-ir‘Z(A . 

ies iGS ies* 

X =(«i -ir‘z(A • 

ie.s* 

We write m =x/x* ,v^sH sf . Whatever be the sample chosen let {u,v) assume values in 
a bounded closed convex subset, R, of the two-dimensional real space containing the 
point (1,1). Let h {u, v) be a function of u and v such that 

h(\,\)=l (2.1) 

and such that it satisfies the following conditions: 

1. The function h (u,v) is continuous and bounded in R. 

2. The first and second partial derivatives of h(u,v) exist and are continuous and 
bounded in R. 




Now one may consider the class of estimators of defined by 

9 hd^rh{u,v) (2.2) 

which is double sampling version of the class of estimators 

?,=rf{u*y) 

Suggested by Srivastava and Jhajj (1986), where u* =x/X, v* and {x,Sl) are 

known. 

Sometimes even if the population mean X and population variance Sl of x are 
not known, information on a cheaply ascertainable variable z, closely related to x but 
compared to x remotely related to y, is available on all units of the population. This type 
of situation has been briefly discussed by, among others, chand (1975), Kiregyera (1980 
,84). 

Following Chand (1975) one may define a chain ratio- type estimator for as 






f -*\ 

X 




f *1^ 
X 


r^n 


— 


— * 


2 


' *2 


\ ^ J 









(2.3) 



where the population mean Z and population variance Sl of second auxiliary variable z 
are known, and 

ies* ie.s* 

are the sample mean and sample variance of z based on preliminary large sample s of 
size Hi (>n). 



The estimator pj^ in (2.3) may be generalized as 

9id 
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(2.4) 



where a , '5 (i=l, 2,3,4) are suitably chosen constants. 

Many other generalization of pj^ is possible. We have, therefore, considered a 
more general class of from which a number of estimators can be generated. 

The proposed generalized estimators for population correlation coefficient p^^ is 
defined by 

^rt(u,v,w,a) (2.5) 

where w^z* jz ,a^sf jsl and t(u,v,w,a)\s a function of that 

^( 1 , 1 , 1 , 1)=1 ( 2 . 6 ) 

Satisfying the following conditions: 

(i) Whatever be the samples (s and s) chosen, let (u,v,w,a) assume values in a closed 
convex subset S, of the four dimensional real space containing the point P=(l, 1,1,1). 

(ii) In S, the function t(u,v,w,a) is continuous and bounded. 

(iii) The first and second order partial derivatives of t(u,v,w, a) exist and are 
continuous and bounded in S 

To find the bias and variance of p,^ we write 




5 2 5 2 (1 + ei ), X = X(1 + ), X • = X(1 + el), 5 ^ ^ 

5^ = (1 + e; ), z * = Z (1 + e; ), s;2 = 52 (1 + e: ), 5 ^, = 5^, (1 + e J 

such that E(eo) =E (ei)=E(e 2 )=E(es )=0 anAE(ej*) = 0 V i= 1,2, 3, 4, 

and ignoring the finite population eorreetion terms, we write to the first degree of 

approximation 

E[el)- (5 400 - 1)/« , E[el )= C,2 /„ , £ (e;2 )= C,2 jn, , E[el )= (8 040 - l)/n , 

E[ef )- (8 040 - l)/«, , E[ef )= C ,2 jn , , E[ef )= ( 8 004 - l)/«i , 

E(el )= {(8 220 /p 4 x ) - 1 }/« . E{e^ e^)^ 82 ioCjn, ^(eo ej* ) = 8 210 C, /n j , 

E{^q ^ 2 )” (^220 “ ^)l^’E{eQ ^ 2 )= (8220 “ l)/^l ^3 )~S 201 ^ 7 /^1 ’ 

^(eo ^4 ) = (8 202 - 1)/« 1 . ^(^”0 ^5 ) = {(s 310 / P 4 X ) - 1}/« > 

-^(^i ^1 )~ Ai ’Eie^ , 

^(^”1 A)= 9x.CxCjuy ,E[e^ e;) = 8 oi 2 C,/«j ,£'(ei e 5 ) = ( 8 i 2 oC,/p^,)/n, 

^2)=8o3oC,/«j ,£'(e; e2*)=8o3oC,/ni ,£'k* 63 *)= p^,C^Cjn^ , 

< ) = 8 012 C, /n j , ^k 65 ) = ( 8 120 C, / P 4x )/« 1 > 
i^k ^ 2 )~ (8040 “ 0 /”i ’-^k ^ 3 )~ 8021^7 /”i 5-^k ^ 4 )~ (8022 “ 0/^1 > 

-^(^2 ^5 )~ {(8i3o/Pjx)“ 1 }/”’ f^k ^3 )~8 o2iC'z /«! 5 
^k* el ) = (8 022 - 1)/« 1 . ^k’ <^5 ) = {(8 130 / P >,x ) - l}/«i . 

^k e4)=8oo3C,/ni,£'k ^ 5 )= ( 8 inC,/p^,)/nj , 

£"k e5)={(8ii2/pjx)-l}Ai- 
where 

8 = P;,,™ /(p 2 oo" Po 2 cf PoAk = (V^)Z k' “ k' “ being 

1=1 

non-negative integers. 

To find the expeetation and varianee of p,^ , we expand t(u,v,w,a) about the point 
P= (1,1, 1,1) in a seeond- order Taylor’s series, express this value and the value of r in 
terms of e’s . Expanding in powers of e’s and retaining terms up to seeond power, we 
have 

E{9td)=Pyx+o(n^^) (2.7) 

whieh shows that the bias of is of the order n'^and so up to order n'^ , mean square 
error and the varianee of p,^ are same. 

Expanding (p^^-p^,,)^, retaining terms up to seeond power in e’s, taking 
expeetation and using the above expeeted values, we obtain the varianee of p,^ to the 
first degree of approximation, as 




Var(p,,) = Var(r) + (pj, /«)[C^f (P) + (5o4o -l)tl{P)-At, (P)-Bt,(P) + 25 ,,,Cj,(P)t,(P)] 
-ipyj n,)[Cltf (P) + (5 040 - 1)?2 (P) - Cltl (P) - (5 004 - l)tl (P) - At, (P) - 
Bt, (P) + Dt, (P) + Ft, (P) + 25 030 CPi {P)F {P) - 26 {P)t, (P)] 

( 2 . 8 ) 

where tj(P), t 2 (P), t 3 (P) and t/P) respeetively denote the first partial derivatives of 
t(u,v,w,a) white respeet to u,v,w and a respeetively at the point P= (1,1, 1,1), 

Par('r)=(p;,/n)[(822o/p^J+(l/4)(8o4o+54oo+2822o)-{(Si3o+83io)/P,x}] (2.9) 

^ ~ {8210 “*“8 o 3 o “ 2(8)20 / P yx )} P - x^P ~ (8 220 8 040 “ 2(8)3 q / Pyx )}^ 

2 ^ ~ (8 201 8021 “ 2(8 ))) / P),x)}^z ’.f ~ {8 202 + 8 022 “ 2(8 ))2 ! p yx)) 



Any parametrie funetion t(u,v,w,a) satisfying (2.6) and the eonditions (1) and (2) ean 
generate an estimator of the elass(2.5). 



The varianee of at (2.6) is minimized for 
(P) ^ l = a (say), 
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Thus the resulting (minimum) varianee of p,^ is given by 



11 A 

min .Var(p,^) = Var{r) - ( ) p [ 

n n, 






^ {(J/CJ8q3o-P}^ 
4(8 040 “ 8 030 “ 1) 
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{(P/CJ8oo3-P} 

4(8 004 “ 8 003 “ 1) 



( 2 . 10 ) 



( 2 . 11 ) 



It is observed from (2.11) that if optimum values of the parameters given by 
(2.10) are used, the varianee of the estimator p,^ is always less than that of r as the last 
two terms on the right hand sides of (2.1 1) are non-negative. 

Two simple funetions t(u,v,w,a) satisfying the required eonditions are 

t(u,v,w,a)= \+a,{u -1) + a2(v - 1) + a3(w-l) + a4(a - 1) 
t{u,v,w,a) - 

and for both these funetions ti(P) =a), ti (P) =a2 , ts (P) =a^ and t4 (P) =a4 . Thus one 
should use optimum values of a,,a 2 , a 3 and a 4 in p,^ to get the minimum varianee. It is 
to be noted that the estimated p,^ attained the minimum varianee only when the optimum 




values of the constants a, ( 1 = 1 , 2 , 3, 4), which are functions of unknown population 
parameters, are known. To use such estimators in practice, one has to use some guessed 
values of population parameters obtained either through past experience or through a 
pilot sample survey. It may be further noted that even if the values of the constants used 
in the estimator are not exactly equal to their optimum values as given by ( 2 . 8 ) but are 
close enough, the resulting estimator will be better than the conventional estimator, as has 
been illustrated by Das and Tripathi (1978, Sec. 3). 

If no information on second auxiliary variable z is used, then the estimator 
reduces to defined in (2.2). Taking z = 1 in (2.8), we get the variance of to the 
first degree of approximation, as 

f 1 1 ^ r 1 

Fa,(P,„) = Fa,(r) + pj. (l,l) + (5„, - (1,1) - (1,1) - SA, (1,1) + 28 (1,1)*^ (1,1) 

\n nj 



( 2 . 12 ) 



which is minimized for 



h (1 1) = 

2Q^ (5 0,0 -5 0^30-1) 



h.(l,l)= 

2Q^ (5 040 -6 0^30-1) 



(2.13) 



Thus the minimum variance of is given by 

min.Var( (5, ,)=Var(r) -(---) P'x [^+ 1 (2-14) 

n 4C^ 4(8 040 8 030 1) 

It follows from (2.11) and (2.14) that 

min.Var(p«)-min.Var(p„)=(p;>,) [ ] (2.15) 

4C," 4(8oo4-5o"o3-1) 

which is always positive. Thus the proposed estimator is always better than 



3. A Wider Class of Estimators 

In this section we consider a class of estimators of p^^ wider than ( 2.5) given by 

(3.1) 



where g(r,u,v,w,a) is a function of r,u,v, w,a and such that 

g(p , 1 , 1 , 1 , 1 )= p and = 1 

L J(p, 

Proceeding as in section 2, it can easily be shown, to the first order of approximation, that 
the minimum variance of is same as that of p^^ given in ( 2 . 11 ). 

It is to be noted that the difference-type estimator 
rd= r + tti (u- 1 ) + aj (v- 1 ) + (w- 1 ) + a, (a- 1 ), is a particular case of p^^ , but it is 

not the member of Ptd in (2.5). 




4, Optimum Values and Their Estimates 



The optimum values ti(P) = a , t2(P) = P , t3(P) = y and t4(P) =8 given at 
( 2 . 10 ) involves unknown population parameters. When these optimum values are 
substituted in ( 2 . 5 ) , it no longer remains an estimator since it involves unknown 
(a , p ,7 ,8 ), which are functions of unknown population parameters, say„8^^„ (p, q,m= 

0 , 1 , 2 , 3 , 4 ), Cx, Cz and p„ itself. Hence it is advisable to replace them by their consistent 

estimates from sample values. Let (d,p,y ',5 ) be consistent estimators of ti(P),t2(P), 
t3(P) and t4(P) respectively, where 



{P) = d = 



[^(8 040 1 ) . 58 q 3 qC ^] 

( 8 'o 4 o “8 030 “ 1) 



[BCl-AKnC, 



(8'o40 8*030 1 ) 



f 3 (P)=y = 



[Z)(8qo4 1) .F8 003(7^] 

2C^(8oo4-87o3-1) 



[CiF-DS„C 



2C;(6 ,„ -6 ^0, -1) 



with 

^ = [5 210 + 5 030 - 2(5 120 / r)]C^ , 

D = [8 201 + 5*021 - 2 ( 5*1 n / , 

^x ~ ^x ^ ^ ~ ^z ^ ^ ^ ^ pqm ~ hp(?m/(h200 ho20 ho02 



“ [5 220 ■*" 5 040 2(5 130 /r)], 

F =[5*202 + 5 022 -2(5*112 /r)], 

pH q!2 »i/2 j 



i=\ 

z = (l/n)^z,. , =(n-l)^'^(x ,. -xf ,x = (l/n)^x,. , 

i=\ 1=1 1=1 

r = SyJ{SyS^),s] =(n-l)-‘^(y. -y f ,sl = (n - l)^*2(x,. -zf . 



( 4 . 1 ) 



1=1 



We then replace (a , p ,7 ,8 ) by (d, p,y'',8 ) in the optimum resulting in the estimator 
say, which is defined by 

Pi^ =rd(M,v,vv,a,d,p,y',8), ( 4 . 2 ) 



where the function t*(U), U= (w,v, w,a,d, P,y ',5 ) is derived from the the function 
t(u,v,w,a) given at ( 2 . 5 ) by replacing the unknown constants involved in it by the 
consistent estimates of optimum values. The condition ( 2 . 6 ) will then imply that 
t*(P*) = 1 ( 4 . 3 ) 

where F* = (1,1,1,1, a , p ,7 ,8 ) 



h(F*) = 



We further assume that 
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(4.4) 
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Expanding t*(U) about (1,1, 1,1, a , p ,y ,8 ), in Taylor’s series, we have 

p:=r[t*(E’) + (M-l)t;(E’) + (v-l)t;(E*) + (w-l)t;(E*) + (fl-l)t:(E’) + (d-a)t;(E*) 
+ (P - P)tg(E*) + (y' -y )?)(/** )+ (8* -8)tg (E*) + second order terms] 

(4.5) 

Using (4.4) in (4.5) we have 

9 *td - ^[1 + (m - l)o. + (v - 1)P + (w - l)y +{a- 1)8 + second order terms] (4.6) 



Expressing (4.6) in term of e’s squaring and retaining terms of e’s up to seeond degree, 
we have 



(pL - 9yxf = Pyx[^(2es -eo -e 2 )+a(ei - el) + - e[) + y el +8 e^y 



(4.7) 



Taking expeetation of both sides in (4.7), we get the varianee of to the first degree of 
approximation, as 



1 1.2 



Par( Ad) = Par(r)-(--—)p 



n n, 



A{AICAd,,,-B) 



+ (p«/«i) 






A A 4(8 040 -5 0 ^ 30 - 1 ) 

{(D/CJ8 003 -6’}" 



^003 

_4C,^ 4(8oo4-5o"o3-1) 

whieh is same as (2.1 1), we thus have established the following result. 



(4.8) 



Result 4.1 : If optimum values of eonstants in (2.10) are replaeed by their eonsistent 
estimators and eonditions (4.3) and (4.4) hold good, the resulting estimator Ad the 
same varianee to the first degree of approximation, as that of optimum p^^ . 



Remark 4.1 : It may be easily examined that some speeial eases: 




(0 9m (ii) =r 



{1 + c£(m - 1) +y''(w- 1)} 
{l-P(v-l)-8(a-l)} 



(iii) (5,^3 = r[l + c£(m - 1) + P(m - 1) +y'(w- 1) + 8 (a - 1)] 



(iv) (5,^4 =r[l-d(M-l)-P(M-l)-Y(w-l)-8(a-l)] 



-1 



of ph satisfy the conditions (4.3) and (4.4) and attain the variance (4.8). 

Remark 4.2 ; The efficiencies of the estimators discussed in this paper can be compared 
for fixed cost, following the procedure given in Sukhatme et. al. (1984) and Gupta et. al. ( 
1992-93). 

5, Empirical Study 

To illustrate the performance of various estimators of population 
correlation coefficient, we consider the data given in Murthy [1967, P.226]. The variates 
are: 

y=output, x=Number of Workers, z =Fixed Capital 
N=80, n=10. Til =25 , 

X = 283.875, 7 = 5182.638, Z=1126, = 0.9430, = 0.3520, C, = 0.7460, 

5qo3 =1.030, 5oo4 =2.8664, =1.1859, =3.1522, 8030 =1.295, 8040 =3.65, 

8102 = 0.7491, 8120 = 0.9145, 8„i = 0.8234, 8130 = 2.8525, 

8112 =2.5454, 8210 =0.5475, 8220 = 2.3377, 8201 =0.4546, 8202 =2.2208,8300 =0.1301, 
8400=2.2667, P4, =0.9136, p,, =0.9859, p^, =0.9413. 

The percent relative efficiencies (PREs) of 9u^9hd^9td with respect to conventional 
estimator r have been computed and compiled in Table 5.1. 



Table 5.1: The PRE’s of different estimators of p_^^ 



Estimator 


r 


9 hd 


9 td (or 9 *td ) 


PRE(.,r) 


100 


129.147 


305.441 



Table 5.1 clearly shows that the proposed estimator p,^ (or p,^ ) is more efficient 
than rand p^,^. 
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